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Social-media and blogs are increasingly used for social-communication, an 
idea and thought publishing platform. Public intentions, wisdom, problems, 
solutions, mental states are shared in social media. Text is being the best and 
the most common way to communicate over social networks. All kinds of 
data shared in social sites like Facebook, Twitter, and Microblogs. People 


from different pursuance uses these media to publish thoughts and convey 
messages through text. Consequently, occurrences in social life are rapidly 
Keywords: discussed in social blogs in daily manner. This work aims at discovering 
ongoing social crisis from the Twitter data. Text mining technique and 
sentiment analysis were applied to detect the current social crisis from the 
a social sites. Twitter data were collected to identify the recent social crisis. 
Text mining Furthermore, the identified crisis was compared to reputed newspapers. A 
Twitter hybrid method used to detect recent social issues resulted nicely. However, 
our proposed analysis shows identifying rate 89%, 95%, 83%, 53%, and 
98% for the top 5 identified crisis accordingly in the date between 27 
February and 11 March 2020. The strategy used in this study for the 
detection of recent social crisis will contribute to the social life and findings 
of crisis will be eliminated easily. 
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1. INTRODUCTION 

The occurrence of social crises nowadays is an influential concern and arduous to eliminate 
instantly. Crisis in social life becomes a concern of the government and people of a community [1]. Every 
situation that becomes a reason of danger for social life, people consider it as a social crisis. If it can be 
identified, bad effects can be minimized and the problem can be eliminated. During any social crisis, reliable 
information is essential for solving that issue. Many organizations have already utilized the power of social 
media but a few teams worked for the people in social crisis [2]. Some of the social media and microblogs 
are popular with users such as Twitter. It is not only the source from where scholars can collect information 
as a tweet but it is also beneficial for tracking any crisis occurring in social life [3]. A good amount of 
information can be collected from Twitter to analyze the topic and identify crisis. When something happens 
in society, the first effect [4] can be seen over social media. Sometimes people do not state anything in real 
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life straight but huge posts, blog writings, and short messages spread over the internet through social media. 
However, those posts or blog writings can be authentic or fake [5] to make a rumor on social media. 

Text mining [6] has taken the challenge to resolve the problem from personal to business reason in 
this era of internet of things (loT). When people are nowadays totally depending on social media [7] for news 
and updates, they are living in virtual life every day. Each problem and solution are posted on social media 
even though it is not discussed in real life. Text mining can accumulate information that is needed to 
recognize a problem or to solve the issue. Social media text produces most of the information about what is 
going on in the world. Natural language processing (NLP) techniques [8] from unstructured data such as 
Facebook and Twitter text data, topic detection [9] can help by recognizing the most discussed stuff in social 
life. Sentiment analysis along with text mining [10] has been solving a huge number of issues. It can be used 
in many different ways to track the target. Different works have been done with sentiment analysis using 
various lexicon based [11] and machine learning techniques [12] like support vector machine, Naive Bayes, 
term frequency-inverse document frequency (TF-IDF), unigrams, valence aware dictionary for sentiment rea- 
soning (VADER) by the researchers. 

Sentiment analysis [13] is making a great change in this era identifying negative and positive 
sentiment with reasonable accuracy. From private work to public work or organization everywhere sentiment 
is important it is being the topic of concern for the scholars [14]. Since people use social media regularly to 
publish their thoughts and problems, this study’s objectives are: i) identify the topic discussed frequently for 
some days; ii) analyze the sentiment behind the topics; and iii) identify the topic which is a crisis for the 
specific region and specific time. Twitter was considered to conduct text mining and analyze sentiment for 
these reasons: i) Twitter is used by millions of users; ii) Twitter is usually used by literate people; iii) Twitter 
is used all over the world; iv) getting Twitter data is free and easy; v) Twitter data provides age, sex, region, 
date; and vi) most of the researchers find Twitter data valuable. In this paper, a hybrid method is proposed to 
identify the recent major social crisis in a specific region and time. Using orange 3, an opensource data 
mining application [15] this study used the Liu-Hu sentiment analysis method for the following reasons: i) it 
provides a reasonable accuracy [16] with the dataset of this work; ii) Liu-Hu method in orange is free; iii) 
Liu-Hu in orange is easy to use for sentiment analysis; iv) it does not need a training dataset to analyze 
sentiment; and v) proffer a clear view of negative and positive sentiment based on the specific topic. Bag of 
words model in Spyder is used to classify text and to count word occurrence from social media and to 
identify the frequently discussed topic for the following reasons: i) bag-of-word is simple to understand; ii) 
easy to implement; iii) bag-of-word works faster for our dataset; and iv) many scholars prefer bag-of-word 
for [17] text classification. 

Last few years social media text mining and social media emotion mining is a focus point for the 
researchers. But even so, few fields are existing to work by identifying the gap in the research limitations. From 
education [18] to tourism [19] social media text [20] and sentiment analysis are being used nowadays to help to 
make correct decisions [21] clustering the types of short text structures and predicting [22] various factors. 
Alvermann [23] discussed the critical inquiry in the text of social media. Thelwall used social text to detect 
magnitude, stress and relaxation [24]. Singh et al. [25] talked about the trends ongoing in social media. A 
customizable pipeline focused by Sarker [26] for social media. Ariffin and Tiun [27] focused on the tagger of parts 
of speech in his study. According to Pinto et al. [28], the performance of NLTK toolkits in social media and 
regular text is compared. Eryigit and Torunoglu-Selamet [29] discussed the text normalization of social media. 
According to Hee et al. [30], an automated cyberbullying detection carried out. Uteuov and Kalyuzhnaya [31] 
combined the hierarchical topic model and document embedding for social media. Lexical normalization discussed 
by Han et al. [32]. Wu et al. [33] discussed deep learning in sentiment analysis of social media text data. Where the 
study needs more validation and comparative analysis is possible to interpret online opinions on various 
microblogs. This study can forward to analyze the sentiment for crisis detection from social media. Mansour [34] 
studied the thinking of people on social media about the Islamic State of Iraq and Syria (ISIS) using sentiment 
analysis and text mining. Multilingual tweet analysis and the battle between ISIS and the rest of the world are not 
done and another topic such as social problem identification can be conducted following this study. In [35], [36] 
proposed models for sentiment and emotion mining. Suggestion based recommender system can ahead with a 
larger dataset in this study. Action rules extracted by Ranganathan and Tzacheva [37] concerning the user 
emotions that help by providing suggestions to enhance users’ feelings to lead a better healthy life. Learner’s 
evaluation system for teachers can be implemented with this work. Lexicon-based and machine learning approach 
used together [38] on product review sentiment study. Emoji is not included in the text to analyze sentiment. 
Derakhshan and Beigy [39] got better accuracy in stock price movement prediction by sentiment analysis on stock 
social media. A market simulator can be implemented to estimates profit and lack. A valuable brand sentiment 
analyzed by Mostafa [40] from social networks to marketing research companies. Most representational topics can 
be gained and addressed by the work scope that is not done in this study. Only 3,500 tweets considered to conduct 
the research. According to Canales et al. [41], a semiautomatic method for emotion detection from social media 
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text used. More testing and development of new manual jobs with further annotators and a greater amount of the 
data is needed. According to Chekima and Alfred [42], a process for sentiment analysis from Malay text is 
conducted. Only Malay text is analyzed. Many other languages can be considered in a new job. News sentiment is 
measured and generated a sentiment scoring model by Shapiro ef al. [43] using a sample of articles rated by 
humans on a scale of positivity and negativity, compared to predictive accuracy at a set of sentiment analysis 
models, combining existing lexicons and accounting for negation. More accuracy required to improve performance 
in further work. Comparing support vector machine, Naive Bayes, maximum entropy also the linguistic inquiry 
and word count (LIWC), the affective norms for english words (ANEW), general inquirer and SentiWordNet. 
Hutto and Gilbert [44] presented VADER having a great F1 classification accuracy for social media text sentiment 
analysis. Suppala and Rao [45] measured customer’s opinions and perceptions by sentiment analysis on social 
media data using a naive Bayesian algorithm. The study used the Twitters featured data set, more features in the 
database can be added to enhance the research. 

A chapter on text mining with the unstructured text [46] discussed different techniques on NLP, 
relation extraction tools, topic modeling, and deep learning. Market prediction using social media text [47] 
can make strategic decisions. Standardization and comparative performance evaluation are needed in market 
prediction. Lia et al. [48] proposed a realtime monitoring tool for changing customer needs investigation 
from product planning from social media mining. The approach applied to only one target product where 
analysis in different disciplines can be done. 

There is a lot of research gaps to work with. However, there is no study conducted to identify the 
current major social concern in recent specific dates and regions. This study targets to identify social issues 
ongoing from the social media text mining and sentiment analysis [48] to confirm if the target topic is a 
social crisis or not. This paper concentrates on detecting an ongoing social crisis in an area using both 
rulebased method and machine learning algorithm. The remainder of this paper is organized as follows: 
section 2 provides a brief overview of the reviews related literature related to sentiment analysis and text 
mining. Section 3 presents the architecture of the proposed system. Section 4 discusses how data is collected 
and the processing of data. Section 5 summarizes conclusions and discusses future work. 


2. METHOD 

This study developed a method to find out the recent major social crisis using lexical and machine 
learning techniques. The general strategy is to first target a date from which day we need to identify the crisis. 
This research targets the last 14 days for the identification of the intricacy. Then identified the frequently used 
keywords in social media. This work used tweeplers, a trend detector online tool for the tweeter micro-blog 
posts. From the tweeplers after identification of the common keywords, a search conducted on the recent 
common keywords in social media (Twitter) with the help of application programmer's interface (API) provided 
by Twitter. The common keyword indicates the frequently used keywords in user posts on a social site. A 
region name needs to be added when a country or subcontinent is a target to find out the crisis. Bangladesh was 
added with the common keywords as the target country to conduct the research. A combination of a region’s 
name and common keywords compels to be added to search tweets. After collection of data set needed to clean 
and pre-process. Pre-processing techniques of data are discussed in section 4.1. A cleaned dataset is used for 
topic detection with the model bag-of-words. Bag-of-word performed surprisingly well in topic identification in 
this study. The technique is described in section 3.2. When topics are identified, the study cannot determine 
which are the major social concern yet. In the next step classified and prioritized topics are then used for 
sentiment analysis. Prioritized topics are categorized into two sections, negative and positive to determine the 
crisis. This study used the Liu hu model for sentiment analysis discussed in section 3.3. Later on, analyzing the 
sentiment on the topics, the result is compared with some of the top reliable online regional newspapers of the 
specific country, which is considered to recognize the major social crisis between those days. Crisis 
identification architecture is described in Figure 1. 


2.1. Text mining based topic identification 

Bag of words a decent machine learning technique, used for Twitter text mining. This study used 
bag-of-words to identify the word calculation from the tweets. The natural language toolkit (NLTK) 
tokenization is used first to tokenize sentences. String substitution used to normalize whitespace and to 
extract hypertext markup language (HTML) tags. Each word transformed into lower text before 
lemmatization and the creation of a corpus. A corpus is used to count repeatedly used words increasing value 
by 1. An algorithm 1 to count repeated words in a dataset is stated as (1). 

Here major issue holds the value of each word count for each of the words. From the tweets gathered 
From Twitter, each word is counted uniquely. When a word is reproduced, word count increases by value +1, or 
it keeps its value as the earlier value. The initial value for each word is 0. Executing the same manner for the 
whole dataset, each word is computed. The highest used keyword in a dataset holds the maximum value. Based 
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on word count descending order is ensued to prioritize subjects’ array. A representation of prioritized tweets 
data is shown in Table 1. Here the maximum used keyword is on the top of the list then the second one and so 
on. Top 10 prioritized topics from this table are practiced in this research to identify the major issues in social 
networking micro-blog site, Twitter. Some unwanted and unnecessary data can be seen in the table that creates 
no sensible value. Data is again concocted for topic classification to avoid outcast and unnecessary topics, 
addressed in data collection and processing section 4.1. After excluding extraneous words that do not create 
sentiment value e.g. country name, place name a precise output is shown in Table 2 as the identified topics. 


Algorithm 1 

Major Issue[Word] 

If(word not in MajorIssue) : MajorIssue[word] = 1 
Else:MajorIssue[word]+=1 (1) 


Recent 
Common 
Words 


Identity P—/( Compare 
Crisis paut 


Region 
Name 


Figure 1. Crisis identification architecture 


Table 1. Prioritized topics 


Key Type Size Value 
Modi int 1 6,534 
Pakistan Int 1 4,997 
Coronavirus Int 1 4,846 
Hindu Int 1 4,720 
March Int 1 3,657 
Temple Int 1 3,037 
Dhaka Int 1 2,772 
Place Int 1 2,608 
Islamist int 1 2,517 


Table 2. Identified topics 


Key Type Size Value 
Modi Int 1 4,997 
coronavirus int 1 4,720 
Hindu Int 1 3,657 
March Int 1 3,037 
Temple Int 1 2,772 
Dhaka Int 1 2,757 
islamist int 1 2,717 


2.2. Text mining based topic identification 

Sentiment analysis is used to analyze the text sentiment. Some studies have appropriated sentiment 
analysis techniques in crisis domain for detecting the sentiments of posts on disaster management. Sentiment 
analysis is used to identify the sentiment of classified prioritized topics. This study used a lexicon based 
sentiment analysis model Liu-Hu in orange. Orange is a desktop-based data visualization and analysis tool 
used by many researchers, which is opensource. Liu-Hu uses sentiment modules from NLTK. Two types of 
results are considered here for our dataset, positive sentiment, and negative sentiment. If the topic relevant 
sentences proffer maximum positive value, it is acknowledged as a exclude topic. The negative value is 
worthy here to identify the crisis. Figure 2 and Table 3 draw the sentiment analysis from the classified topics. 
Here all the identified topics are chosen to analyze sentiment from the text. Figure 2 is for each of the topics 
is analyzed in this way in the orange tool using the Liu-Hu method. It provides -10 to +10 sentiment value for 
each sentence. 
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Figure 2. Sentiment analysis using Liu-Hu in orange 


The total sentence count in the dataset related to a specific topic and positive sentences that have no 
preference in this study. This makes no quandary or not a topic to be a crisis, the possible crisis is no here. In 
the same manner, the neutral sentences make no sense whether it is positive or negative nor a crisis. It gives 
straight 0 value output, which is considered as a positive number and a possible crisis is measured as no. The 
negative sentences attest that the topic can be the crisis for a region at a specific type. Average sentiment 
value is the focal point here to determine the crisis showed in Table 3. Once the average sentiment value is 
calculated, positive and neutral topics are eliminated in Table 4. From the existing negative sentiment topics, 
the study determines the major issue as a rank column based on topic frequency count. The most frequently 
discussed topic with negative sentiment value is determined as a crisis rank 1 in this study. Then the 
following next topic and so on, stated in Table 5. Excluding the positive sentiment value, average sentiment 
shows the effect of public posts in this study. 


Table 3. Identified possible crisis 


Date between Keyword Total data kW used Positive data Neutral Negative data Avg sentiment Possible crisis 
27/2/2020 to 11/3/2020 Dhaka 28,264 27,867 4,932 15,670 7,265 0.74961 Yes 
27/2/2020 to 11/3/2020 Modi 28,264 6,767 1,044 2,937 2,786 -2.475996972 Yes 
27/2/2020 to 11/3/2020 Coronavirus 28,264 4,351 342 1,923 2,086 -2.10068 Yes 
27/2/2020 to 11/3/2020 Hindu 28,264 3,476 149 543 2,784 -5.23379 Yes 
27/2/2020 to 11/3/2020 Temple 28,264 2,762 47 488 2,227 -3.0303 Yes 
27/2/2020 to 11/3/2020 Islamist 28,264 2,516 38 214 2,264 -6.5698 Yes 
27/2/2020 to 11/3/2020 March 28,264 1,664 708 706 250 0.716042 No 


Table 4. Eliminated positive and neutral topics 


Date between Keyword Total data kW used Positive data Neutral Negative data Avg sentiment kW rank 
2020-02-27-2020-03-11 Modi 28,264 6,767 1,022 2,937 2,786 -2.475996972 1 
2020-02-27-2020-03-11 Coronavirus 28,264 4,351 342 1,923 2,086 -2.10068 2 
2020-02-27-2020-03-11 Hindu 28,264 3,476 149 543 2,784 -5.23379 3 
2020-02-27-2020-03-11 Temple 28,264 2,762 47 488 2,227 -3.0303 4 
2020-02-27-2020-03-11 Islamist 28,264 2,516 38 214 2,264 -6.5698 5 


Table 5. Identified major crisis sentiment level 


Date between Keyword Total data kW used Positive data Neutral Negative data Avg sentiment kW rank 
2020-02-27-2020-03-11 Islamist 28,264 2,516 38 214 2,264 -6.5698 1 
2020-02-27-2020-03-11 Hindu 28,264 3,476 149 543 2,784 -5.23379 2 
2020-02-27-2020-03-11 Temple 28,264 2,762 47 488 2:22, -3.0303 3 
2020-02-27-2020-03-11 Modi 28,264 6,767 1,022 2,937 2,786 -2.475996972 4 
2020-02-27-2020-03-11 Coronavirus 28,264 4,351 342 1,923 2,086 -2.10068 5 


3. RESULTS AND DISCUSSION 
3.1. Data collection and processing 

The initial query for data collection performed from 11 February 2020 to 11 March 2020 on 
Twitter. An API was used for tweets collection provided by Twitter to a developer account. The keywords 
used were “crisis”, “attention”, “Dhaka”, “virus”, “issue”, “religion”, “coronavirus”, “want justice”, 
“problem” to retrieve data from the micro-blog. “Bangladesh” and “Dhaka” were used in addition to these 
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keywords to search the data. Here Bangladesh and Dhaka were applied as the target location to identify the 
crisis in march 2020. Till the last day, 28,264 tweets were collected using the specific keywords for the 
region Bangladesh. After that, we performed data processing to prepare our data. Trifacta, an online data 
wrangler tool was used to get cleaned and structured data formats. This study used some techniques to 
prepare the data through Trifacta are the following: i) removed unwanted columns; ii) trimmed whitespaces; 
iii) removed URLs; iv) removed symbols; v) removed accents from texts; vi) removed hash (#) from the 
hashtags; vii) removed usernames from the text; viii) removed retweets; and ix) removed at (@) sign from 
posts. Then, the processed data was used to identify the social crisis for Bangladesh. 


3.2. Evaluation 

The main goal of this study was to identify the recent social crisis from the social media text. To 
evaluate the method, the study conducted two appraisals. First collected some reliable newspapers from 
Bangladesh which were published after the date of 11" March 2020 to 15" March 2020. The study searched 
topics in 11 newspapers randomly including the daily star, daily observer, Dhaka Tribune, daily sun, 
Bangladesh today, financial express, and prothom-alo. Then the study focused on the topics of the 
newspapers those headlines matched with our identified topics. If it is matched then, after a manual reading, 
we numbered it positive and negative based on the effect of those topics discussed in newspapers. We 
considered financial, social, religious, and political effects in this job. When the topic was not clear to 
demonstrate the core idea to score the value it was marked as zero or neutral. Also, topics that were not 
present in the newspapers were scored as neutral. After analyzing data of the next 4 days from the data 
collection date, we found the effect of those topics on social life. The total paper citation was 44 times 
including all the papers. Negative scores of all the topics create value here. After getting the output from the 
newspapers we compared the result with our identified topics. The compared result is stated in Figure 3. The 
chart shows the comparison of the study result and the newspaper result. It seemed the results matched the 
prediction of this research. Finally, the result of this study could predict the correct output as the crisis 
depending on the word frequency and sentiment analysis. The sentiment score was also matched wonderfully 
with the sentiment level of the local public news described in newspapers. However, this research focused on 
the recent crisis, identifying the negative and positive sentiment value. 


Compared Chart of Newspaper and TestData 


News Paper Test Data 


Modi Corona Hindu Temple islamict 


Figure 3. Comparison chart with newspaper 


From Table 6, we can observe, the identified confusion matrix of collected and verified data. 
Correctly identification rate is 89%, 95%, 83%, 53%, and 98% respectively for the words ‘Modi’, ’Corona’, 
’Hindu’, ’Temple’ and ’Islamist’. 53% for one topic was considered comparing with newspapers because 
some topics never get picked up by the news media for many reasons but those are absolutely crisis making 
and more discussed topics in social media sometimes. Frequently used negative words were taken as the test 
data and newspaper data as true data. Rate over 0.50% was taken as 1. The total keywords count was 
calculated from the data and subtracted the natural values. After subtraction negative or positive frequent 
keywords count, rest words were divided and the result was multiplied by 100 to get the rate of negative and 
positive value. The used formulas for identifying cases were: 


S = (nkw/(kw — n)) * 100 (2) 
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In (2) used for identifying negative keywords from test data and newspaper data also (3) used for 
positive keywords. Where S is the result and nkw means negative keywords, kw means total keywords of the 
specific topic, n means natural words. This study identified true-positive, false-positive, true-negative, and 
false-negative data. Table | shows the identifying rate of data. 


Table 6. Confusion matrix 

TN TP FN FP 
Modi 73 16 0 11 
Corona 86 9 0 5 
Hindu 80 5 15 0 
Temple 50 3 47 0 
Islamist 98 0 0 2 


3.3. Limitation and future work 

Due to changes in the organizational structure of Twitter, there might be an effect on users. In future 
we may work on, social crisis during COVID-19 [49] period by using Twitter data and may find their 
probable intentions [50]. Again, combined model of machine leaning and non-machine learning models can 
be applied in further development of this work [51]. 


4. CONCLUSION 

A crisis detection approach was proposed in this study for identifying the ongoing social problems. 
Dates between 27 February 2020 and 11 March 2020 was selected to collect data to determine the crisis at 
that time. Local newspapers were used as the validator of the detected crisis to measure the result of the 
output we get from this study. Bangladesh was used as the location for the crisis zone. This study used the 
machine learning-based bag of-words method and the lexicon based Liu-Hu sentiment analysis method, 
which worked great to conduct the result with the collected dataset. As a later form of this work, it is 
important to introduce an algorithm that can offer a decision support system from the result of this work as a 
recognized social crisis. A decision support system along with this work can eliminate the crisis or can 
reduce the effect of the crisis in social life. We expect that the study will determine the crisis from social 
media and it will assist to reduce the problems to help both the public and the government of any specific 
region. 
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